Automated Detection, Segmentation, and Classification of Pleural Effusion From Computed Tomography Scans Using Machine Learning

Objective This study trained and evaluated algorithms to detect, segment, and classify simple and complex pleural effusions on computed tomography (CT) scans. Materials and Methods For detection and segmentation, we randomly selected 160 chest CT scans out of all consecutive patients (January 2016–January 2021, n = 2659) with reported pleural effusion. Effusions were manually segmented and a negative cohort of chest CTs from 160 patients without effusions was added. A deep convolutional neural network (nnU-Net) was trained and cross-validated (n = 224; 70%) for segmentation and tested on a separate subset (n = 96; 30%) with the same distribution of reported pleural complexity features as in the training cohort (eg, hyperdense fluid, gas, pleural thickening and loculation). On a separate consecutive cohort with a high prevalence of pleural complexity features (n = 335), a random forest model was implemented for classification of segmented effusions with Hounsfield unit thresholds, density distribution, and radiomics-based features as input. As performance measures, sensitivity, specificity, and area under the curves (AUCs) for detection/classifier evaluation (per-case level) and Dice coefficient and volume analysis for the segmentation task were used. Results Sensitivity and specificity for detection of effusion were excellent at 0.99 and 0.98, respectively (n = 96; AUC, 0.996, test data). Segmentation was robust (median Dice, 0.89; median absolute volume difference, 13 mL), irrespective of size, complexity, or contrast phase. The sensitivity, specificity, and AUC for classification in simple versus complex effusions were 0.67, 0.75, and 0.77, respectively. Conclusion Using a dataset with different degrees of complexity, a robust model was developed for the detection, segmentation, and classification of effusion subtypes. The algorithms are openly available at https://github.com/usb-radiology/pleuraleffusion.git.

C omputer-aided quantification and diagnosis systems have become widely available in thoracic radiology, and various pathologies can be automatically detected, segmented and classified on chest radiographs and computed tomography (CT). [1][2][3] For pleural disease, effusions can be detected accurately from radiographs, also with deep learningbased image analysis. 4 However, the occurrence 5 and amount 6,7 of effusions are independent prognostic indicators. This became evident in the COVID-19 pandemic when infected patients with pleural effusions had a higher incidence of severe courses, prolonged hospital stays, and higher mortality rates. 8 Compared with radiography, CT provides accurate pleural effusion quantification; nevertheless, in radiology reports, effusions are commonly described only qualitatively because manual delineation is time-consuming. Automated quantification methods based on traditional image processing or atlas segmentation have resulted in moderate performance, have not included effusion-free control cohorts, or had limited sample sizes. 9,10 Computed tomography is especially relevant for a detailed assessment of effusion subtypes (ie, hemothorax, empyema, malignant effusion, and pneumothorax) and for detection of the causative diagnosis. 11 Additional pleural complexity features, such as hyperdense fluid, pleural thickening, gas, and loculation, are used to differentiate between serous and these more complex effusion subtypes [12][13][14] (from now on referred to as simple and complex effusions, respectively). This differentiation has implications for patient management [15][16][17] and outcome, 18,19 whereas machine learning models could also be used for CT-guided planning and fast detection of associated periprocedural pneumothorax and hemothorax. 20,21 Hypothesis and Purpose We aimed to develop machine learning models that (1) accurately detect and (2) robustly segment pleural effusions. Our third aim was classification (3) into simple versus complex pleural effusions with random-forest classifiers.

MATERIALS AND METHODS
The local ethics committee approved this retrospective study (Project ID 2021-00946).

Study Population
The study population consisted of cases with and without pleural effusion, defined as positive and negative cohort, respectively. For the positive cohort, 2659 consecutive patients were retrospectively identified with chest CT scans performed at our tertiary hospital between January 2016 and January 2021 containing the term "pleural effusion" in the radiological report (Fig. 1). We then randomly selected 160 CTs for segmentation of lungs and pleural cavity, preserving the distribution of pleural complexity features as in the whole cohort (Text, Supplementary Digital Content 1, http://links.lww.com/RLI/A689). For the negative cohort, we selected an equal amount of CT datasets (n = 160) from our institutional database as previously described 22 and conducted a secondary image review for the presence of pleural effusion (reader 1, R.S., postgraduate year [PGY] 4). Our considerations on sample size estimation are provided in Supplementary Digital Content 2 (text), http://links.lww.com/RLI/A690.
After review by reader 1, detection performance was externally validated on all patients in the public National Lung Screening Trial (NLST) dataset 23 (n = 1061, 2234 CTs with soft tissue kernel). External validation of the segmentation performance was performed using data from the publicly available PleThora project (n = 34), 24 consisting of manual reference segmentations of the thoracic cavity and effusions in Non-Small-Cell Lung Cancer patients.
For classification of pleural disease, we selected all patients of the positive cohort with biopsy or thoracocentesis within 7 days of the chest CT examination (n = 335).

CT Acquisition Parameters
Scans were acquired using 3 different CT scanners: Somatom Definition Flash (n = 284, 2 Â 128 slice system), Somatom Definition AS+ (n = 262, 128-slice system), and Somatom Definition Edge (n = 109, 128-slice system; all scanners: Siemens Healthineers, Erlangen, Germany). The peak kilovoltage was 120 kVp and an automated tube current modulation was performed. The contrast agent Iopromide (Ultravist 370, Bayer Pharmaceuticals, Berlin, Germany) was administered in 301 (arterial phase, n = 70; biphasic, n = 93; venous, n = 15; CT pulmonary angiography, n = 123; n = 208 in the classification cohort) of the 655 CT studies at a standard injection rate of~4.0 mL/s and a body weight-adapted volume of up to 120 mL. A soft tissue kernel (30f ) of 1.0 mm served as the only input for the algorithm.

Pleural Effusion Detection and Segmentation
Reader 1 manually segmented the pleural cavity, after processing the original 3-dimensional (3D) chest CTs in a medical image software as previously described. 22 The segmentations were then exported with separate labels for lung, pleura, and background.
To measure interrater variability, 12 studies were randomly selected from the test dataset and were segmented by reader 2 (T.W., intraining, PGY5) and reader 3 (Julien Poletti, in-training, PGY1), who were blinded to the radiology reports. To measure intrarater variability, the same cases were segmented again by reader 1, blinded to and 4 weeks apart from the initial segmentation. The deep learning model was trained with nnU-Net, which is self-configuring in terms of preprocessing, architecture selection, training, and postprocessing (Table, Supplementary Digital Content 3, http://links.lww.com/RLI/A691). 2 An ensemble from the 5-fold crossvalidation models was used for inference. All processing was performed in Matlab R2018b and Python 3.7.

Pleural Effusion Classification
We defined effusions with additional pleural complexity features as complex and effusions without complexity features as simple. In the 335 cases for effusion classification, the additional pleural complexity features "hyperdense fluid," "pleural thickening," "gas," and "loculation" were visually determined by reader 1 in consensus with reader 4 (J.B., 29 PGY) and radiologically defined as: Hyperdense fluid: Density values greater than 15.6 HU in the pleural cavity, 25 not otherwise explained, for example, by artifacts. Additional potential indicators such as rib fractures, postoperative changes, pleural fluid sedimentation, or pleural contrast extravasation confirmed the diagnosis, if present.
Pleural thickening: Nodular or smooth pleural line as seen in the soft tissue kernel.
Gas: Density values less than −850 within the pleural cavity resembling microbubbles (gas surrounded by pleural fluid) and/or pneumothorax.
In addition, the classification dataset was dichotomized based on the resulting diagnosis of serous effusion from biopsy or thoracocentesis in the test dataset. However, microscopic evidence of erythrocytes in an otherwise serous effusion was not rated as a complex effusion, as this can be periprocedural.
The sample was randomly split into training/validation and testing datasets (n = 234 and n = 101, 70% and 30%, respectively).

Classification Models
Random forest models (for details: Table, Supplementary Digital Content 5, http://links.lww.com/RLI/A693) were used for classification into simple and complex effusion as well as for the prediction of any underlying pleural complexity features ("hyperdense fluid," "pleural thickening," "gas," and "loculation"), resulting in a total of 5 models. Initially, the count of positive and negative cases for each classification task was inevitably unbalanced; therefore, the datasets were randomly downsampled to a 1:1 ratio. For each class, preliminary training was performed to select the most informative variables (50 percentile of feature importance of both interpretable complexity and radiomic features). Then, with the most important half of the variables, the models were further fine-tuned with a leave-one-out cross-validation. Finally, a model was trained for each of the 5 classification tasks and was evaluated on the test data with a testpositivity threshold greater than 0.5.

Statistical Analysis
To evaluate pleural effusion detection and classification performance, we used sensitivity, specificity, negative predictive value (NPV), positive predictive value, and ROC analysis. We used nonparametric tests to evaluate intergroup differences (Mann-Whitney U test for 2 variables and Kruskal-Wallis test for more than 2 variables). To evaluate the performance of the segmentation algorithm, we used Dice coefficient and intraclass correlation coefficient (ICC) to compare with human intrarater and interrater variability. Volumetric results were compared with Bland-Altman analysis and linear correlation with the Pearson coefficient.
For classification, diagnostic accuracy measures are reported separately both for the radiological absence or presence of pleural complexity features (simple and complex effusion, respectively) and based on reports from biopsy or thoracocentesis (serous effusion as simple; presence of pleural empyema or pleural carcinomatosis as complex effusion).
A P value of <0.05 was considered statistically significant. All statistical analyses were performed in R 4.0.5 (R Core Team, Vienna, Austria). All results in the main text refer to the respective test datasets for segmentation and classification, whereas the respective results of the cross-validation for detection (
Based on the manual reference standard of the segmentation (n = 160), 74 of the 160 CT examinations showed bilateral pleural effusions. The total effusion volume ranged between 2-2318 mL (mean [SD], 285 [402] mL; median, 131 mL) in the cross-validation and 5-2094 mL (mean [SD], 469 [499] mL; median, 332 mL) in the test dataset. In the test segmentation dataset, 3 had hyperdense fluid, 5 had pleural thickening, 5 had gas, and 5 were loculated. The sample size of the test dataset was confirmed after testing the ensemble of models with an ICC of 0.993 (95% confidence interval [CI], 0.98-1.00; power, 0.90). Therefore, the following accuracy and performance measures are based on the test dataset.
In the classification cohort, 147 patients had simple pleural effusions (no pleural complexity feature) and 188 patients had complex effusions (1 complexity feature, n = 84; multiple complexity features, n = 104), with a total of n = 17 with hyperdense fluid, n = 95 with pleural thickening, n = 100 with gas, and n = 128 with loculation. Of the 208 CT studies with contrast agent administration, 63 patients had visible pleural enhancement.

Detection of Pleural Effusion
With the radiological reports as the reference standard, the sensitivity for detection of pleural effusion was 0.99 (95% CI, 0.91-1.00) and the specificity was 0.98 (95% CI, 0.95-1.00). The AUC for the segmentation cohort (both validation and test data) was 0.996 (95% CI, 0.97-1.00). Figure 3 shows an example of segmentation and Table 1 summarizes the diagnostic accuracy measures. Failure analysis of incorrectly classified cases can be found in Supplementary Digital Content 9 (Figure), http://links.lww.com/RLI/A697.

Classification of Pleural Effusion
The initial training of the classification models identified all interpretable complexity features as informative features. Figure 4 shows 2 examples of model input. From the radiomics features, mostly pleural "shape features" (elongation, flatness, least axis length, maximum 2D diameter, maximum 2D diameter, mesh volume, minor axis length, sphericity, surface area, and surface volume ratio and voxel volume) were integrated during the preliminary training. The most informative features depended on the classification task and were Ƒ pleura_rate and Ƒ hyper_rate for "pleural thickening"; Ƒ inout_rate , Ƒ inout_ratio_index for "hyperdense fluid"; Neighborhood Grey Tone Difference Matrix (NGTDM) strength and NGTDM busyness for "gas"; and Ƒ hyper_rate and Ƒ inout_ratio_index for "loculation" (see Table, Supplementary Digital Content 12, http://links.lww.com/RLI/A700).

DISCUSSION
We developed and comprehensively analyzed an algorithm for the automated detection and segmentation of pleural effusions and introduced strategies for the classification between simple and complex pleural effusions. A highly sensitive detection (0.99; 95% CI, 0.91-1.00) and a robust segmentation (Dice: 0.84; 95% CI, 0.80-0.88) were achieved. The classification between simple and complex pleural effusion resulted in a modest sensitivity of 0.67 and a moderate specificity of 0.75, whereas the random-forest algorithms incorporated both radiomics and radiologically interpretable complexity features, such as density values and their distribution in the pleural cavity.
First, the performance of a widely adopted deep learning-based segmentation method 2 was tested in a clinical dataset, systematically containing both simple and complex pleural effusions, as well as patients without effusions. An accurate detection rate was also shown in the external NLST dataset. 23 Similar to other deep learning-based nonpleural segmentation tasks, 33-35 the detection and segmentation accuracy was high, irrespective of effusion complexity, laterality, effusion volume, and previous application of contrast agents. Previously, computer vision methods have been used for automated pleural effusion segmentation on limited CT sample sizes. 9,10 The proposed segmentation algorithm provides robust volumetric results in a large and heterogeneous clinical sample and therefore might have implications for clinical use and offers the potential for prognostication. 19 The segmentation algorithm was validated on the PleThora dataset, 24 consisting of tumor-associated effusions, and provided a good volumetry with an ICC of 0.97. Dice coefficient and absolute volume difference were inferior compared with the test dataset, partially explained by inconsistencies of human-delineated segmentations in the PleThora dataset, whereas our algorithm tends to primary segment similar densities. Previously, effusions have been detected and (semiquantitatively) quantified in chest radiography, 36 although sonography is superior in detecting effusions, which in turn is limited in effusion volumetry compared to CT. 37,38 In contrast, if applied broadly and systematically, our proposed algorithm has the potential to be utilized for reliable follow-up measurements.
Second, for the classification of pleural effusions, we defined "complex" effusions as opposed to serous or "simple" effusions. The former category subsumes various pleural diagnoses (ie, hemothorax, empyema, malignant effusion, and pneumothorax), which radiologically have partially overlapping, complexity features, 39,40 often used in decision making [15][16][17] and prognostication. 18,40 The classification task identified the prespecified complexity features as informative, whereas the addition of the radiomic features further leveraged diagnostic accuracy. The classification between simple and complex effusions showed a moderate performance with an AUC of 0.77, whereas classification for the separate pleural features ranged between an AUC of 0.52 for hyperdense fluid and an AUC of 0.91 for pleural thickening. This can be partially explained by the moderate diagnostic accuracy of the CT with its predominantly high specificity and lower sensitivity for different pleural diseases. 12,41 The relatively high NPVs can aid in the identification of complex pleural effusions, yet the low positive predictive values indicate the necessity of a radiological evaluation. Still, an objectifying visualization of the automated results is pivotal to familiarize radiologists with automated (yet non-black-box) tools, as we have previously shown in other volumetric tasks. 42 The introduction of shape and textural features has been proposed to overcome the varying interrater agreements with regard to the classification of complex pleural lesions. 43 Interestingly, in the present study, most of the radiomic features were discarded in the pretraining selection step, whereas the predefined, interpretable complexity features were more relevant for the classification tasks. Similarly, classification of tumor grade prediction has previously achieved higher AUC with prespecified features based on "traditional" radiological characteristics compared with a radiomics-based model, whereas their combination showed the highest diagnostic accuracy. 44 The preference of our classification models for traditional features of pleural complexity is contributing to the ongoing discussion about the applicability of radiomics in CT. 42,45 In future research, automated pleural segmentation and classification might also contribute to better prognostication, that is, identification of treatment responders from diaphragm shape analysis. 46 There are several limitations to our work. First, eligible patients were retrospectively selected on scanners of 1 vendor at a single institution. The models' performance on examinations acquired with different scanners might differ. However, a similarly small sample size of CT scans from a new site might serve for training a custom segmentation nnU-Net model, after adopting the settings as shown in Supplementary Digital Content 3, http://links.lww.com/RLI/A691. Second, reference standards for segmentation and classification were based mainly on imaging. Nevertheless, the reported features had been validated by at least 3 radiologists (one of which was board certified). Third, the absolute number of patients with hemothorax in the classification cohort was relatively low. This was probably the cause for the low diagnostic accuracy of the classification algorithm for hyperdense fluid, which could be improved in the future by increasing the sample size.  . ROC related to the entire test dataset classification cohort (n = 101). The thick blue line represents the prediction for "simple effusion." For cases with additional pleural complexity features, the respective ROCs of the 4 models are shown. The number of positive features in the test dataset is 39 for "no complexity feature," 25 for "pleural thickening," 33 for "gas," 38 for "loculation," and 5 for "hyperdense fluid."

Implications for Practice
Automatic detection and robust segmentation of pleural effusions in chest CTs allow for routine use without interaction, 3-dimensional volumetry, and rapid quantification. The proposed classification can be used to identify pleural effusions with and without pleural complexity features, and thus, radiologists can be aided in the diagnoses of patients with empyema, hemothorax, or pneumothorax. The trained models are openly available on a public repository.